The IREX 10: Identification Track assesses iris recognition performance for identification (a.k.a one-to-many) applications. Most flagship deployments of iris recognition operate in identification mode, providing services ranging from prison management, border security, expedited processing, and distribution of resources. Administered at the Image Group’s Biometrics Research Lab (BRL), developers submit their matching software for testing over sequestered iris data. As an ongoing evaluation, developers may submit at any time.
Two-eye Accuracy:
| Accuracy Metric : |
FNIR (i.e., “miss rate”) at an FPIR of 0.01(± 90% confidence) |
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
The number after the ± indicates either the 90% confidence interval (for accuracy) or the standard deviation (for times and sizes).
Single-eye Accuracy:
| Accuracy Metric : |
FNIR (i.e., “miss rate”) at an FPIR of 0.01(± 90% confidence) |
| Dataset: | Operational Dataset 4th pull |
| Samples used: | One eye |
| Enrolled Population: | 1M irides (500K people) |
| Enrollment Method: | One enrollment session per eye |
Core accuracy for the identification task can be characterized by Detection-error trade-off (DET) plots. Generally, curves lower down in a DET plot correspond to more accurate matchers. The plots are interactive through the use of the Plotly.js graphing library.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
| Dataset: | Operational Dataset 4th pull |
| Samples used: | One eye |
| Enrolled Population: | 1M irides (500K people) |
| Enrollment Method: | One enrollment session per eye |
Rank-based metrics are general better at reflecting performance for investigational tasks, where the algorithm returns a list of candidates for an inspector to further scrutinize. The rank 10 “hit rate” is the fraction of searches that return the correct candidate within the top 10 candidates. The miss rate is one minus the hit rate.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Computation times are measured as the the elapsed real time (i.e., “wall clock” time) as opposed to CPU time. Timing estimates were computed on unloaded machines with only a single process dedicated to biometric operations. The test machines are Dell PowerEdge M910 blades with Dual Intel Xeon X7560 2.3 GHz CPUs (with eight cores per processor).
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Previous IREX evaluations identified a speed-accuracy trade-off whereby the more accurate matchers tend to take longer to return search results. The plot below shows FNIR as a function of median search time for each matcher. FNIR computed at an FPIR of \(0.01\).
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
The figure below shows the FNIR at FPIR=0.01 (t = 0.702) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.702) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 2700) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.00917) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.0095) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.371) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.38) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.27) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000536
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.462) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.143) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 64) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 65) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.295) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.299) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 37) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.529) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0004185
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.53) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 1228) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
The figure below shows the FNIR at FPIR=0.01 (t = 0.197) for different demographic groups. The bars show 95% confidence intervals.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | Both eyes |
| Enrolled Population: | 500K people |
| Enrollment Method: | One enrollment session per person |
Some consolidation of demographic information was necessary to improve statistical power. Eye color was consolidated to either light (grey, blue, or green) or dark (brown or black). Some subjects were labeled as being neither male nor female. Meaningful results for these categories could not be obtained because their sample sizes are too small. For the same reason, results for races other than white and black are not shown. The precise definitions of race, sex, and eye color used here can be found in EBTS version 10.0.
This section models the relationship between FNIR and various demographic characteristics using logistic regression. The response variable is whether the search produces a false negative at an FPIR of 0.01. The precise logit relationship is
where p is the probability of a false negative and ℓ is the log likelihood ratio of the probability of a false negative.
McFadden’s = 0.0000438
n = 311,452
Negative (blue) values mean the probability of a miss is decreased. McFadden’s pseudo is a measure of the goodness-of-fit that produces values between 0 and 1. Race, sex, and eye color are generally poor predictors of accuracy, so the value is typically low.
The model does not include any interactions between race and eye color because there were not enough cases of blacks with light eyes to produce meaningful results. Eye color was unavailable for some subjects so MICE was used to perform imputation.
Other races, sexes, and eye colors are ignored due to the infrequency of their occurence in the test dataset.
Some of the participant’s submissions output estimates of sample quality for each processed iris image. The ANSI/NIST-ITL 1-2011 standard requires these estimates to be in the range 0 to 100 and to quantitatively express the predicted matching performance of the sample. Error-reject rate curves show how FNIR can be reduced by discarding the poorest quality samples in the test data. In our case, the quality of a search was set to the minimum quality assigned to the searched image and its enrolled mate.
The figure below demonstrates that FNIR (i.e. the ‘miss rate’) can be reduced by almost 20% by discarding just 1% of the poorest quality searches. Presumably, this 1% involved samples where the subject was blinking, moving, looking off-axis at the moment of capture, etc. The IREX III supplemental failure analysis found that matching failures for the most accurate matchers over a different dataset were almost entirely due to poor presentation of the iris.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | One eye |
| Enrolled Population: | 1M irides (500K people) |
| Enrollment Method: | One enrollment session per eye |
The stacked barplot below shows how sample quality impacts the probability that a search will miss (i.e. fail to return the correct mate). Samples assigned low quality values should be more likely to miss. For Neurotechnology’s matcher, when the assigned value is 0 the probability of a miss is greater than 50%. FPIR is set to \(0.01\).
| Dataset: | Operational Dataset 4th pull |
| Samples used: | One eye |
| Enrolled Population: | 1M irides (500K people) |
| Enrollment Method: | One enrollment session per eye |
The sample quality of left and right iris images acquired during the same session are expected to be highly correlated. In addition to having similar capture environments, dual-eye cameras acquire both images at nearly the same instant so poor presentation of the irides at the moment of capture (e.g. blinking or moving at the moment of capture) detrimentally affects both images. For this reason, matching both acquired images vs. matching just one yields only a moderate improvement in accurary. The figure below shows the distribution of qualities with each axis represneting the quality of one of the iris images (left or right) acquired during the same capture session.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | One eye |
| Enrolled Population: | 1M irides (500K people) |
| Enrollment Method: | One enrollment session per eye |
The acquisition protocol for OPS4 images has probably improved over time. Better iris cameras and capture environments are likely to have improved the quality of the acquired images. Iris recognition accuracy is highly dependent on the prevalence of very poor quality samples. Misses tend to occur when the subject was blinking, moving, looking off-axis (etc.) at the instant of capture. The figure below shows the prevalence of these very low quality samples in OPS4 for each capture year. Comparatively few images in OPS4 were collected prior to 2014 so results for these images are omitted. An iris sample was deemed to have very low quality if its quality value is among the lowest 2% (i.e. below the 2% quantile) of all images in OPS4.
| Dataset: | Operational Dataset 4th pull |
| Samples used: | One eye |
| Enrolled Population: | 1M irides (500K people) |
| Enrollment Method: | One enrollment session per eye |
Combining the results from multiple submissions sometimes yields improved accuracy over individual submissions. In this section score-level fusion is used to combine search results from multiple submissions. Equal-weighted Neyman-Pearson fusion is used to merge candidate lists from different submissions into a single consolidated candidate list. The dissimilarity score associated with each candidate is normalized prior to fusion (see LFAR score). This normalized score is a measure of similarity rather than dissimilarity. Any candidate appearing on multiple lists is assigned a single fused score by summing the the individual LFAR scores. The merged candidate list is then reordered by the LFAR scores.
Only fusion results that yield an improvement in accuracy over the individual submissions are shown.
Accuracy is impacted by the size of the enrollment database (a.k.a the gallery size). Identification of the correct mate is expected to be more difficult for larger enrollment database sizes. The figure below plots FNIR (at FPIR=\(0.01\)) as a function of enrollment database size.
| Dataset: |
Operational Dataset 4th pull (Stats on OPS4 images) |
| Accuracy Metric: | FNIR (i.e., “miss rate”) at an FPIR of 0.01 |
| Samples used: | Both eyes |
| Enrollment Method: | One enrollment session per person |
Some apparant trends may be the result of random variation. Results for the 10K and 50K enrollment sizes were computed from 140K searches. Results for the 100K and 500K enrollment sizes were computed from 700K searches.
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 392
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 391
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 0
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 14
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 3
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 2
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 4
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 0
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 0
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 2
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 5
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 1
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 6
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 6
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 175
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 0
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 0
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Between 2010 and 2018, West Virginia University and the University of Notre Dame collected iris images of identical and mirror twins during the annual Twinsday Festival. The data collection procedure is described in Sabatier et al. Many twins participated in the data collection on multiple years. In all, \(5,078\) iris images from \(691\) twins were used to collect the results below.
The comparison scores were collected as follows: all available images were enrolled in a database; the same set of images were searched against the database, producing a total of \(5,078 \times 5,078 = 25.7\) million scores, including \(72,587\) twins scores, \(75,651\) cross-eye (i.e. left-vs-right irises from the same person) scores, and 25.5 million nonmated scores. The scores are not truly one-to-one if the submission performs enrollment-side score or template normalization.
NOTE: Some plots may not render well if the matcher produces highly discretized scores.
Histograms of Score Distributions
Cummulative Score Distributions
The shading shows an estimate of the 95% confidence interval for twins comparisons.
One-to-many Matching
The impact of twins on one-to-many accuracy is also assessed. An enrollment database of one million eyes was generated, padded with images from the OPS 4 dataset to reach the targeted enrollment size. \(4,384\) searches were performed where the searched person’s twin was enrolled. If twins detrimentally impact matching accuray, they should be more likely to cause false positives. The table below shows how frequently twins from the \(4,384\) searches contribute to false positives.
Number of times twin appears at rank 10 or above: 349
| Samples used: | One eye |
| Enrolled Population: | 1 million iris images |
| Number of Searches: | 4,384 |
Participants are allowed to submit an implementation once every 3 calendar months.
Please send comments and recommendations to irex@nist.gov.
Inquiries and comments may be submitted to irex@nist.gov. Subscribe to the IREX mailing list to stay up-to-date on all IREX-related activities.